arXiv:1509.06167v2 [cs.DS] 29 Sep 2015 


Parallel Query in the Suffix Tree 


Matevz Jekovec^ and Andrej Brodnik^’^ 

^University of Ljubljana, 

Faculty of Computer and Information Science, SI 
{matevz.jekovec,andrej.brodnikjOfri.uni-1j.si, 

WWW home page: http://lusy.fri.uni-1 j .si/ 

^University of Primorska, 

Faculty of Mathematics, Natural Sciences and Information Technologies, SI 


Abstract. Given a query string of length m, we explore a parallel pat¬ 
tern matching in a static sufSx tree based data structure for p <C n, 
where p is the number of processors and n is the length of the text. We 
present three results on CREW PRAM. The parallel query in the suffix 
trie requires 0{m + p) work, 0(m/p + lgp) time and 0{'n?) space in the 
worst case. We extend the same technique to the suffix tree where we 
show it is inherently sequential in the worst case. Finallywe perform an 
interleaved parallel query which spends 0(m Igp) work, 0(^ Igp) time 
and 0(n Igp) space in the worst case. 
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1 Introduction 

Pattern matching is one of the basic operations in text handling applications. 
We want to count occurrences of the given pattern in a text, locate them and 
retrieve the document content at specified position. Essentially there are two 
types of algorithms for pattern matching. The first type that constructs a finite 
automaton based on the pattern and process the text using this automaton 
The least amount of work for finding all the occurrences of the pattern is 
bounded by the text size n. The second type first constructs a text index (e.g. 
the suffix array m, the suffix tree [H], or one of the compressed indexes, such 
as the FM index m)- The queries are then answered by examining the index 
only (i.e. self-indexes), or a combination of the index and a subset of the original 
text. Required work of these algorithms is bounded by the pattern size m. In 
this paper we will focus on the suffix tree data structure, the fundamental text 
indexes and supports the pattern matching operations in 0{m) time. 

Parallelism in text handling applications seems obvious since in practice we 
need to process a large amount of data. Vishkin [55] designed the first opti¬ 
mal parallel algorithm of the first kind for locating occurrences of the pattern, 
which requires 0(n/p) time on CRCW PRAM for p < n/logm. Later, work- 
optimal and work-time-optimal parallel string algorithms were introduced on 




EREW PRAM by Czumaj, Galil and others [6]. Research on parallel algorithms 
of the second kind mostly focused on efficient construction of the text index. 
Farach-Colton et al. [S] provided theoretical ground for optimal parallel suffix 
tree construction in the parallel disk-access model of computation. Latest prac¬ 
tical algorithms for suffix tree construction with different worst case theoreti¬ 
cal bounds include, for example, ERA mm and Parallel Cartesian Tree [2T] . 
Researchers have also worked on both practically and theoretically sound algo¬ 
rithms for the suffix array and longest common prefix array construction umi. 
In terms of the query speed research mostly focused on reducing cache misses 
when navigating the tree. Clark and Munro worked on succinct cache-efficient 
suffix trees. Ferragina and Crossi introduced String B-trees [lOj requiring opti¬ 
mal 0(logg n) cache misses in the worst case for any query, where B is the block 
size in the external memory model. Brodal [3] designed a cache-oblivious vari¬ 
ant of a data structure requiring the same optimal cache complexity. Notice the 
parallel trie navigation presented in this paper could also use a cache-oblivious 
organization of the trie in the backend. Demaine et al. [S] showed that arbitrary 
trie cannot be laid out to a memory and incur f2{\ogg n) cache misses on queries 
in the worst case without some redundancy: Ferragina and Crossi propagated 
every 0(i?)-th value to the upper levels while Brodal used multiple layers of 
giraffe trees and bridges on top of the original trie. 

In this paper we present a parallel query technique on a suffix trie and extend 
it to the suffix tree. Then, we explore a completely different approach to the 
query which employs interleaving the subqueries. Throughout the paper we will 
use CREW PRAM model with p < 2m processors. Without loss of generality we 
assume p = 2^ for some integer x. The space complexity is expressed in words, 
if not stated otherwise. Our contributions are: 

1. A highly scalable parallel query algorithm in a suffix trie requiring 0(rn+p) 
work, 0{mfp -|- \gp) time and 0{n^) space in the worst case. 

2. A proof that the parallel query approach used in a suffix trie achieves an 
inherently sequential execution time in the suffix tree. 

3. A parallel query algorithm in a layered interleaved suffix tree requiring 
0{m\gp) work, 0{^ Igp) time and 0{nlgp) space in the worst case. 

To the best of our knowledge, this is the first result on the parallel queries 
in suffix tree based data structures employing p <C n. 

2 Notation and Preliminaries 

We denote by T the input text consisting of n characters from a finite alphabet S. 
Further, we denote by Q the query string of size m. We enumerate the elements 
of a list, an array and a string starting at 1. By p we denote the number of 
processors. 

X[i] denotes character of a string X. While X[xi,X 2 \ denotes a substring 
of X ranging from the position A to, including, the position If ii > * 2 , the 
resulting substring is empty. By X[i, ■] we denote the substring X[i, |Ar|], that is 
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the suffix of X starting at position i. If X and Y are strings, then XY is their 
concatenation. 

2.1 Trie and Patricia trie 

Trie mm is an ordered tree data structure used to store string-based keys. 
Every edge corresponds to one character. Each node in a trie represents a string 
of characters corresponding to the path from the root to this node. Trie can 
be used as a set, or as a dictionary, if we extend nodes to contain some value. 
The space complexity of a trie is 0(ri ■ 1), where n is the number of keys and I 
the length of the longest key. By child(w, c) we denote a child of a node w by 
taking an outgoing edge labeled by a character c, and by parent (w) we denote 
w’s parent. Further 7r(a;) denotes a sequence of nodes on a path from the root 
to the node lu. Similarly 7r(wi,a;2) denotes a sequence of nodes on a path from 
node wi to node a ;2 in a subtree of uji. 

Patricia tree [19] is a path compressed trie, where a chain of nodes with a 
single child is merged into the final node of the chain. This node is either a leaf, 
or a node with more than one child. Each edge now corresponds to, instead of 
one character to a string of them. If we store two keys where the first key is 
a prefix of the second one, we need to add a unique delimiter character or a 
unique sequence of characters at the end of the first key in order to discriminate 
between the two keys. Each node uj stores the first (discriminator) character 
and the length of the incoming edge’s substring skipvalue(a;) < n. We define 
a cumulative skip value, |a;|, for node uj as the sum of all skip values from the 
root to UJ, including uj itself. The cumulative skip value is used during the query 
to determine which character of the query string needs to be compared to the 
character stored at the current edge in a patricia tree. The query is finished 
when we reach the first node with the cumulative skip value greater than the 
query length, or a leaf. Obviously, if any skipvalue > 1, we must compare the 
query string to any of the suffixes stored in the resulting subtree to ensure the 
existence of the pattern in the text. 

Assuming the keys are of constant size, patricia tree takes 0{n) space. Both 
trie and patricia tree have 0{m) sequential query time. 

2.2 Suffix trie and Suffix tree 

Suffix trie is a trie-based dictionary storing each suffix of the input text T, 
where each leaf u stores ref(iz) and represents a suffix T[ref(z^), •]. The root node 
represents an empty string. We say a node in a suffix trie corresponds to exactly 
one substring in the text occurring one or more times and consisting of characters 
on a path from a root to that node. Since suffix Xi = T[i, •] is n — i characters 

n 

long, the space complexity of the suffix trie is ^ 1X^1 G O(n^). 

2 — 1 

Suffix tree is a path compressed version of the suffix trie. Let uj denote a 
node in the suffix tree and let u be the left-most leaf of the subtree rooted 
at UJ. We say uj corresponds to substrings Ai = T[ref{v),ref{v) + i] for all 
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i S {| parent(w)| + 1... |a;|}, where the frequency of each substring in the text is 
equal. Obviously, if w is a leaf, then uj corresponds to a suffix A = r[ref(a;), •]. 
As usual, instead of the text itself, we can store constant size text references 
into the nodes of the suffix tree, so the space complexity of the data structure 
is 0{n). 

In the suffix tree we define a sujjix link from node a' to node a iff their 
longest corresponding substrings are in relation A = A'[2,-]. Since we inserted 
all the suffixes of the text, there exists an outgoing suffix link from each node in 
the suffix tree, except for the root node. Analogously, each node, except for the 
deepest leaf corresponding to T, contains an incoming suffix link. 


2.3 Perfect Hash Dictionary 

A perfect hash dictionary is a hash table which requires 0{d) space for storing 
d elements and uses 0(1) time in the worst case to answer, whether an element 
is in a dictionary. A static two-level perfect hash table has been introduced in 

m- 


3 Parallel query in a SufRx Trie 

We begin by introducing the fundamental suffix trie property used during the 
parallel query method. 

Lemma 1. Let Q be a substring of T and let the node uj in T’s suffix trie 
correspond to Q. Then, there exists a halving pair of nodes (oi, 02 ) corresponding 
to the substrings Ai = Q[l, [jQI/S]] and A 2 = Q[|"|(3|/2]-|-1, •] respectively, where 
Q = A 1 A 2 . 


Proof. Since Q is a substring of T, so are Ai and A 2 . Suffix trie contains all the 
substrings of the text, so ai and 02 exist in the suffix trie (see Figure [^. □ 

The lemma provides a theoretical background for correct and efficient con¬ 
catenation of two substrings of the text represented as nodes in the suffix trie. 
A consequence of Lemma [l] is that for each node w in the suffix trie, there 
exists exactly one halving pair 01,02 such that |oi| = I 02 I for even |w|, and 
|oi| = I 02 I -I- 1 for odd |a;|. After the suffix trie is constructed, we create a dic¬ 
tionary, which maps a halving pair ( 01 , 02 ) —>■ uj for every uj in the suffix trie. 
Notice the order of nodes oi and 02 is important and reflects the order of the 
concatenated substrings. 

The parallel query in the suffix trie is provided by Algorithm [l] To answer 
the query Q we split it among p processors (see Algorithm [^. Each proces¬ 
sor performs the query operation in the original, sequential suffix trie, with the 
assigned subquery. The processor independently navigates the tree. If any pro¬ 
cessor fails to take the corresponding branch during the search, the query does 
not appear in the text and we return an empty set of results. Otherwise, all 
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a) 


b) 


Fig. 1. The suffix trie, a) ai and 02 are the halving pair of a substring which uj 
corresponds to. Gray lines mark the equivalence of the substrings obtained by 7r(Q2) 
and 7r(ai,tj). b) The mapping procedure of nodes ( 01 , 02 ) —> oj. Notice 7r(oi,a;) is 
never navigated. 


Algorithm 1: Parallel query in the suffix trie employing p processors. 
Input: query string Q, suffix trie r, number of processors p 

1 Qi... Qp -s— assignp((3,p) 

2 for i £ {1.. .p} do 

3 ^ Oi -ir- root node of r 

4 for i G {1.. .p} do in parallel 

5 while I Oil < |Qi| do 

6 |_ Oi child(oi, Qi[|oi|]) 

7 for j € [1,2,4... p/2] do 

8 for i = {0 . . .p/j — 1} do in parallel 

9 |_ Oi probe((oij+i, Oij+j+i)) 

10 return oi 


processors successfully found nodes corresponding to their respective subqueries 
in time 0(rn/p). 

Let ai be the resulting node of processor i and w the node corresponding to 
the query string Q. Our goal is to find a node oj from intermediate nodes Ui for 
i = {l...p}. We concatenate the substrings each ai corresponds to pairwise: 
a\ = {ai,ai+i) for every odd i. To obtain the whole path w, we continue 
the concatenation recursively a] ' = (a) for all i mod 4 = 1, in general 

~ j’ concatenations are 

done in parallel. After Igp steps, we reconstructed the whole path and obtained 

LU. 

Time and Work Complexity. The subquery lengths can be determined in con¬ 
stant parallel time. The parallel suffix trie navigation requires at most 0{m/p) 
parallel steps and 0(rn) work in total. Final concatenation of p subqueries re¬ 
quires 0{p) work and can be done in deterministic constant time per concate- 
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Algorithm 2: Implementation of assignp. 

Input: query string Q, number of processors p 

1 for i G {1 .. .p} do in parallel 

2 ji ■(— ReverseBits(i — 1) // MSB <-> LSB 

3 /erii ■<—|Q|/p+mod p)J // lern equals [|Q|/pJ or [IQI/pI 

4 poSi ParPref ixSum(/eni,p) 

5 Qi Qll, leui] 

6 for i € {2 .. .p} do in parallel 

7 L Qi •«- Q[p0Si-x,p0Si-i + leUi] 

8 return Q\ ... Qp 


nation by using the perfect hash table. Using p/2 processors we can map p 
subqueries to p/2 resulting nodes in a single step. Calculating the final w node 
corresponding to the original query is done in Igp steps. The whole query requires 
0(rn + p) work and 0{m/p + Igp) time. Concurrent reading is only required, if 
two processors traverse the same path in the suffix trie or access the same key 
in the hash table. 


Space Complexity. When calculating the subquery lengths, we simultaneously 
need to keep p words. Navigating the suffix trie requires constant space per 
processor. The dictionary size is one record for every node in the suffix trie. 
Overall we use 0{n^) words as does the original suffix trie, which gives us: 


Theorem 1. Parallel query in suffix trie requires 0{m/p + lgp) time on CREW 
PRAM, space and 0{m + p) work. 


4 Parallel query in SufRx Tree 

The space reduction from nodes in the suffix trie to 0{n) nodes in the 

suffix tree comes from removing the nodes with a single child (path compression). 
Consequently, Lemma[^does not hold anymore since the node ai of the halving 
pair might only have a single child in the suffix trie and it will not exist in the 
suffix tree. 

In this section we correctly solve issues originating in the path compression 
and present a parallel query algorithm in the suffix tree. We begin by redefining 
the halving pair we store in the node mapping dictionary. Then we define the 
parallel query for p = 2. Finally, we evaluate the work and time, and we find an 
example where the presented technique is inherently sequential in the worst case. 
Throughout this section we provide pseudocode to better describe our approach. 
To improve readability of the code however, we do not use any parallel constructs, 
but discuss the parallel implementation afterwards. 


6 






4.1 Preprocessing 

We slightly loosen the equal work manner by extending the left subquery Ai 
from Lemma [l] until we hnd a corresponding node oi which exists in the suffix 
tree. Since the left subquery was extended, the right subquery A 2 needs to be 
shortened from the left accordingly. We will provide more details on subquery 
extension and shortening in the next section. First, we prove the existence of the 
new halving pair. 

Lemma 2. For every node lo corresponding to a substring Q in the suffix tree, 
there exists a halving pair 01,02 corresponding to the substrings Ai = Q[l,a] 
and A 2 = Q[a + 1, •] respectively for the smallest a > |"|(3|/2] in the suffix tree, 
where Q = AiA 2 . 

Proof. We start by assigning oi = w and moving from oj towards the root until 
we reach the node corresponding to the smallest d > [|(5|/2]. In the worst case, 
we did not leave the initial node uj and oi = w. Otherwise, Oi is one of w’s 
ancestors. 

If node UJ exists in the suffix tree, then oj is the leaf of the suffix tree, or have 
at least two children. Since the suffix tree contains all the suffixes, including 
the suffixes of Q, then there exists a node 02 corresponding to the substring 
^2 = Q[d + 1,-]. a 2 remains a leaf, if oj was a leaf, or has at least as many 
children as oj had. □ 

We showed the halving pair 01,02 exists in the suffix tree for every node 
OJ. Next, we must decide which substring of the resulting node oj in the suffix 
tree will the halving pair split. Recall the suffix tree definition: oj corresponds 
to, instead of one as in the suffix trie, to skipvalue(a;) € 0{n) substrings. If 
we stored every corresponding halving pair of the potential queries in our node 
mapping dictionary, we would spend 0{n) space for a single node and 
space for all the nodes eliminating the storage savings introduced by the suffix 
tree. Instead, we can only store a constant number of halving pairs per node 
in the suffix tree. To decide which pair to store, we begin with a lemma which 
captures an important consequence of the path compression in the suffix tree. 

Lemma 3. Let a, o' be nodes in the suffix tree and A, A! their longest corre¬ 
sponding substrings, such that A! = CA for some prefix C. Also, let ^ be a 
node in the suffix tree, such that its longest corresponding substring is C. Then, 
k(7,a')l < k(a)l- 

Proof. By counter example. Assume |7r(7, a’)\ > |7r(a)|, then there exists a node 
j3 such that /3 e 71 ( 7 , 0 ') and (3 ^ Tr{a). We denote by B the longest substring 
corresponding to /?. There exist two substrings Bci and Bc 2 in the text for some 
characters ci, C 2 which fj discriminates between. Since the suffix tree contains 
all the suffixes of the text, it also contains a suffix of Bci and a suffix of Bc 2 , 
and the nodes corresponding to these suffixes exist in 7r(a). This contradicts the 
initial assumption, since such /3 does not exist. □ 
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Proposition 1. For every node uj in the suffix tree and its corresponding sub¬ 
string Q, we store a pair ai, 02 corresponding to the substrings Ai = Q[l, a] and 
A 2 = (5[« + 1) ■] respectively for the smallest a> |"|(3|/2] in the suffix tree. 

Suppose we store mapping ( 0102 ) —>■ w as defined in Propositionfor every 
node w in the suffix tree. In Lemmawe showed |7r(a2)| > K(ai,w)|. Follows, 
if a query is shorter that |a;|, the right subquery might end, instead in 02 , in 
one of the ancestors of a 2 , and we will not be able to find the mapping to uj in 
the dictionary. Notice however, if we navigated the tree sequentially, we would 
successfully find w, since the |7r(ai,w)| is smaller. 

In order to correctly answer the query of any length corresponding to w, we 
store the halving pair of the shortest substring corresponding to w, that is of 
length |a;| — skipvalue(a;) + 1. 

Proposition 2. For every node lo in the suffix tree and its corresponding sub¬ 
string Q, we store a halving pair Pi, (32 corresponding to the substrings Bi = 
Q[l,b] and B 2 = Q[b + I, |a;| — skipvalue(a;) + I] respectively for the smallest 
b > r(|w| — skipvalue(tLi) + l)/2]. 

Denote Pi, P 2 , and w as in the Proposition!^ In this case, a query pat¬ 
tern longer than |,5i| -I- \p 2 \ requires special attention. Suppose the query is of 
length |a;|. Observe |/3i| -f |/32| > 1^1/2, because if the query were shorter we 
would have ended at one of the ancestors of lo. Since we are not aware of which 
halving pair Pi , P 2 is stored in the dictionary, we gradually probe all feasible 
candidates during the navigation and remember the deepest one which existed 
in the dictionary. Figure illustrates the query layout of Propositions and 
an the underlying suffix tree. In the rest of this section, we provide a parallel 
implementation of this gradual probing, and discuss the level of parallelism. 



Hi Pi H2 LO 



Fig. 2. a) Illustrated Propositions!^ andThe vertical dots in the string illustrate the 
Position of |a;|/2. b) Illustrated Propositions on the suffix tree, c) Subquery assignment 
in the adapted query method. 












4.2 Query 


The query operation needs to find the deepest halving pair j5i , /32 and map 
{/3i,j32) w, such that the query string will correspond to one of skipvalue{uj) 
substrings. Acknowledging Proposition and storing the halving pair of the 
shortest substring corresponding to w, the halving pairs of the following prefixes 
of Q are potentially stored in the dictionary: 

Q[1,L|Q|/2J +1],Q[1,LIQI/2J +2],...,Q. 

We do not consider prefixes shorter than [|Q|/2J +1 characters since they 
correspond to ancestors of ui due to Lemma[^ Notice the halving pairs of multiple 
prefixes can be stored in the dictionary. In this case we need to consider the pair 
with the largest sum of cumulative skip values of the nodes in the pair. 

Performing a separate query for each prefix defined above would require 
l7(|(3p) work overall. We use a more work efficient approach by taking the 
advantage of the suffix tree. We show how to perform a constant number of 
passes over Q and check the relevant prefixes. 

First, we assign the query string to the left and the right processor as fol¬ 
lows. The shortest prefix of the query as defined above is Q[l, LIQI/2J + !]• 
By acknowledging Lemma its corresponding halving pair potentially stored 
in the dictionary corresponds to the subqueries Bi = (5[1, [jQI/d] -\- x\ and 
B 2 = Q[riQI/4l -b i -b 1, [|(5|/2J -b 1]. X depends on the shape of the suffix tree 
and we cannot obtain it in advance. In the worst case a; = 0, so we assign the 
right subquery Q 2 = Q[riQ|/4] -b 1,-] while keeping the original left subquery 
Qi = Q[l, [1(31/2]] as we did in the parallel suffix trie. Notice Q 2 overlaps Qi 
for [|(3i|/2] characters, however we never concatenate Qi and Q 2 the way they 
were assigned. Instead we shorten the right subquery accordingly during the 
query procedure so the resulting concatenation of the subqueries is always a 
valid prefix of Q. In order to process the longest prefix, Q, the right ends of Qi 
and Q 2 remain [|(3|/2] and |(3| respectively. Figureillustrates the subquery 
assignment. 

During the initial parallel navigation, we need to skip the prefixes shorter 
than Q[l, [|(3|/2] -b 1] characters. Each processor navigates independently from 
each other the suffix tree according to the assigned subqueries and stops when 
each of them navigated [|(3|/4] -b 1 characters. If the cumulative skip value of 
each processor’s node is not exactly [|(3|/4] -b 1, the processor stops at the last 
node which is strictly smaller than [|(3|/4] -b 1. 

Follows the core of the query algorithm which repeats the following two steps. 
The navigation step which we will describe later takes a child corresponding to 
the next character of either the left and the right subquery or only the right 
subquery, and the probe step which effectively concatenates the substrings cor¬ 
responding to the current nodes and checks whether the node corresponding to 
the concatenation exists in the dictionary. We will denote by /3i and (32 the cur¬ 
rent nodes reached by the left and the right processor respectively. In the first 
iteration we probe a halving pair /3i, (32 in the dictionary, which corresponds to 
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the shortest candidate prefix Q[l, [|(3|/2J + 1]. We check whether such a map¬ 
ping exists in the dictionary and, if it does, we remember the resulting node 
uj. We continue with the next iteration of the navigation step and probe the 
dictionary for the new /3i,/32- The algorithm continues until the cumulative sum 
|/3i| + I/ 32 I > I<51- If any of the navigation steps failed due to non-existing edge 
in the suffix tree for the next character, the pattern does not exist in the text 
and we return the empty result set. Also, if a leaf v was reached during the 
navigation step, at most a single occurrence of Q exists in the text at position 
ref{v), if the left subquery reached the leaf, or at position ref{v) — |/3i|, if the 
right subquery reached the leaf. Algorithm formally defines our approach. 


Algorithm 3: Parallel query in the suffix tree for p = 2. 


1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 


Input: query string Q, suffix tree r 
/3i •<— root node of r 
/32 ■<— root node of r 

Qi = o[i,noi/2i] 

Q 2 = Q[nQI/4l + i,-] 

while |/3i| + 1 ^ 2 ! < IQI do 

{Pi, 132 ) -s- NavigateOne(/3i,/32,Qi,Q2,'r) 
if Idil -f I/32I < \\Q\/2\ then 
continue 

u' ^ probe((/3i,^2)) 
if ta' 7^ 0 then 
ca ta' 


12 return u) 


When we finish the query procedure and obtain the resulting node w, we 
need to check the skipped characters during the navigation due to the path 
compression used in the sufhx tree. This step is already present in the original 
suffix tree data structure and is omitted in Algorithm To check the skipped 
characters in parallel we access one of the leafs v rooted in w’s subtree. Then, 
we use a parallel scan where the first processor compares T[ref{v),ref{v) -|- 
[IQI/21] to Q[l, [IQ 1/2]], and the second processor compares r[re/(z/)-|-[|Q|/2]-|- 
1, ref{v) + |Q|] to Q[[|Q|/2] -I- 1, •]. If both subqueries match the text, the query 
substring truly exists in the text and we report all the leaves rooted in w’s 
subtree. 

Next, we describe the navigation step in more detail. Assume /3i and P 2 are 
the current nodes reached by the first and the second processor respectively 
according to the assigned subqueries, and Bi and B 2 are the longest substrings 
corresponding to /3i and p 2 - The invariant of the navigation step is the following: 
B 1 B 2 = Q[l, |i3i| -b |i? 2 |]- We denote by di and ^2 the skip value of the next PiS 
and /? 2 ’s child respectively. If |/3i| > |/32| +^ 2 , the navigation step will follow the 
edge to / 32 ’s child, extend the right subquery = B 2 Q[\Bi\ + \B 2 \ + 1, |i?i| + 
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\B 2 \+d 2 ] while keeping Bi intact and finish. Analogously, if |/3i| < |/32| +^ 2 , the 
navigation step will follow the edge to /3i’s child and extend the left subquery 
B[ = BiQ\\Bi\ + 1, \Bi\ + di\. In this case however, B[ now overlaps B 2 for 
c?i characters, or if c?i > IB 2 I, B[ covers the whole subquery B 2 and more. In 
the latter case, B 2 becomes an empty string corresponding to the root of the 
suffix tree and we are done. In the former case, in order to produce a correct 
concatenation of the subqueries, B 2 needs to be shortened from the left for di 
characters. Shortening a substring corresponding to a node in the suffix tree can 
be done by following the suffix fink from that node. By recursively following the 
suffix finks we can shorten the substring for arbitrary number of characters. In 
the next paragraph we show how to perform this operation in 0(1) time. After 
the right subquery was shortened, B 2 = B 2 [di + 1, |i? 2 |]- Notice probing for the 
halving pair corresponding to B[ and is not feasible, since |B(| + = 

\Bi \ + |i? 2 | and the previous (/3i,/32) was already probed in the last iteration. 
Instead, we continue with the next iteration of the navigate step. Algorithm 
formally defines the described navigation step and Figure [^illustrates the whole 
procedure. 


Algorithm 4: The navigateOne function. 


1 

2 

3 

4 

5 

6 
7 


Input: current nodes /3i) P2, subqueries Qi, Q2 
d 2 ■<— skipvalue(/32) 

while |di| < I/32I + d2 do 

di •<— skipvalue(/3i) 

/3i ^child(/3i,0i[|A|]) 

if di > I/32I then 

P 2 root node of r 
return (/ 3 i,/ 32 ) 


8 shorten(d 2 , di) 

9 d 2 <— skipvalue(/32) 


10 d 2 child(/32, (32[|/92|]) 

11 return (/5i, /52) 


To preform d steps over the suffix links in constant time we use the following 
method. We define the suffix links tree such that for each node in the suffix tree, 
there exists one node in the suffix links tree. The edge between two nodes in 
the suffix links tree exists, if there exists a suffix link between the corresponding 
nodes in the suffix tree. Also, each node in the suffix tree contains a reference 
to the node in the suffix links tree, and the other way around. Observe that the 
node corresponding to the shortened substring for d characters from the left is 
d*^ ancestor of the original node in the suffix links tree. We use the level ancestor 
technique to find the ancestor in constant time and maintain linear space (for 
details, consult to [T]). The shorten function used in Algorithm [^ employs this 
method. 
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Fig. 3. Illustration of the parallel query for p = 2 in the suffix tree. The solid lines 
represent visited paths in the suffix tree by the left and the right processor. Dotted 
arrows represent the followed suffix links. The dictionary contains a record {P 1 P 2 ) —>■ tu. 
Notice the dotted line below the path visited by the right processor shows a similarity 
to the concatenated chunks of paths obtained when following the suffix links. 


Work Complexity. Assuming each character of a subquery corresponds to one 
unit of work, the amount of work to be invested by two processors is ^m+ |to = 
|to. 

Time Complexity. To evaluate the time complexity of our algorithm, we need 
to reconsider dependencies of the second processor to the path constructed by 
the first processor. There exist two such dependencies: 1) the second processor 
requires information on the skip value of the nodes of the first processor in order 
to shorten the right subquery accordingly, and 2) if the condition |/3i| < |,52| +^2 
in Algorithm 1^ is true, the second processor needs to wait the first processor to 
navigate the suffix tree and extend the left subquery, which leads to sequential 
execution in some cases. 

We treat the first dependency as follows. The second processor requires the 
skip values of the nodes navigated by the first processor in order to correctly 
shorten the right subquery. However, it does not immediately require skip values 
of all the nodes on the first processor’s path, but only of the first node that has 
larger cumulative sum than the cumulative sum of the currently navigated node 
by the second processor. By using a one node delay, we solve the dependency 
and we can still construct both paths in parallel in pipeline manner. 

The second dependency requires inherently sequential execution in the worst 
case as a consequence of Lemmaand Proposition!^ First, let processors in¬ 
dependently of each other navigate the first |(5|/2 characters of the query. This 
costs IQI/4 units of time. Then, let (^2 denote the current node of the second 
processor and assume it corresponds to the substring Q[|"|Q|/4] -bl, |Q|/2], where 
the skip value of the next child ^2 = skipvalue(child(/? 2 , Q[|Q|/2-|-l])) > ||(5|. As 
a consequence of Proposition!^ the second processor waits for the first processor 
to navigate the next node and extend the left subquery. The second processor 
shortens the right subquery accordingly, but the required skip value of a new 
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node, now corresponding to (5[|"|(5|/4] + 1 + di, |Q|/2], might still exceed the 
length of the left subquery. In the worst case, the first processor processes the 
whole left subquery, while the right processor only shortens the right subquery 
IQI/d times and spending |(5|/4 time. Finally, when the right subquery was 
shortened for the last time, the updated 6,2 might suddenly become small and 
the second processor needs to navigate the rest of the assigned subquery spend¬ 
ing I (51/2 time in the worst case. Employing any number of processors, we will 
always spend |(5| time overall in the worst case. 

Theorem 2. Parallel query in the suffix tree using eonsecutive subqueries and 
linear space is an inherently sequential operation in the worst ease. 

5 Interleaved suffix tree 

In the previous section we show that the parallel query method used in a suffix 
trie does not allow reasonable speed-ups in the suffix tree. Therefore, we explore 
a different approach to the parallel query where instead of splitting the query 
string into p consecutive substrings, we split it into p-interleaving subsequences. 
To answer such an interleaved query, we need to construct a different, interleaved 
suffix tree and navigate this data structure instead of the original suffix tree. 
Finally, we map the obtained nodes from the interleaved suffix tree to a node in 
the original suffix tree and report the results. 


5.1 fc-interleaved string 

Definition 1. Given a string X consisting of n characters X = ciC 2 ... c„ we 
define k-interleaved subsequences Xi of string X for all i G {I... A} such that 

■ ■ ■ G+k\^(n—i)/k\ • 

For example the 2-interleaved subsequences of the string X = ABRACADABRA 
are Xi = ARCDBA and X 2 = BAAAR. In case of fc = I, the 1-interleaved subse¬ 
quence Xi is the original string X. In case of fc = n, we obtain n-interleaved 
subsequences Xi = Ci for f € {1... n}. 

To deinterleave fc-interleaved subsequences and construct the original se¬ 
quence, we take one character at a time from subsequence Xi for i = [1... A:] 
repeatedly until we reach the end of any subsequence. If the subsequences are 
arbitrarily long, the resulting string consists of the prefixes of the given subse¬ 
quences. 

Definition 2. Deinterleaving k subsequences Xi to resulting string X, where 
\Xi\ for i G {I ,... ,k} is arbitrary long, is done as follows: 


X[i] — mod fc+i[L(* ~ 1)/^J + 1] 


for all i = {I, ... ,k • min{\Xi\, ..., \Xi^\) + argmin |Xj|}. 

3 


13 



In the rest of the paper we will always deinterleave 2-interleaved subsequences 
of arbitrary length. Acknowledging Definition and setting k = 2, we observe 
the following. 

Property 1. Given 2-interleaved subsequences Xi and X 2 of arbitrary length, 
deinterleaving subsequences produces string A oflength |A| = 2-mm(|Ai|, IA 2 I), 
if |Ali| < IA 2 I, or |A| = 2 • min{\Xi\, IA 2 I) -I- 1, if |Ai| > IA 2 I. A compacted way 
of writing is |A| = min{\Xi\, IA 2 I) -b min{\Xi\, IA 2 I -b 1). 

5.2 fe-interleaved sufRx tree 

Definition 3. k-interleaved suffix tree is a suffix tree containing all suffixes 
of all k-interleaved subsequences of the input text T. 

The construction of fc-interleaved suffix tree is the same as of any other suffix 
tree, for example by inserting suffixes of each ^-interleaved subsequence Ti for 
i = {1... /c} of the text T. Navigating a fc-interleaved tree is also the same as 
navigating the original suffix tree. The leaves of the resulting node however, are 
the locations of the query string in the text where there is fc — 1 characters in 
the text between each character of the query string. 

The delimiter character in ^-interleaved suffix tree plays an additional role. 
Recall the delimiter character initially described in Section [2T1 used to discrimi¬ 
nate between the strings where one string is a prefix of the other, or in case of the 
suffix tree, one suffix is a prefix of another suffix. The fc-interleaved suffix tree, 
in addition to patricia tree, also needs to discriminate between equal suffixes 
which are part of different subsequences, and consequently appear at different 
locations in the text. To solve this, we append k unique delimiter characters to 
the input text, so that each subsequence ends with a unique delimiter character 
(see Figure]^. We wrap up this subsection with two obvious properties: 

Property 2. Height of the /c-interleaved suffix tree for the input text of length n 
is at most \n/k']. 

Property 3. The number of leaves in /c-interleaved suffix tree equals the number 
of leaves in the original suffix tree, excluding the leaves representing the delimiter 
characters. 

Notice, by keeping the same number of leaf nodes, the maximum height of 
the /c-interleaved suffix tree in comparison to the original suffix tree is smaller 
because the width of the tree has been increased due to new delimiter characters 
in the alphabet. 


5.3 Layered interleaved suffix tree 

The layered interleaved suffix tree data structure consists of Igp layers, where at 
each layer k for k = {2°, 2^, 2 ^,... ,p} we store fc-interleaved suffix tree for 
the same input text T, and a dictionary which maps a pair of nodes in layer k 
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Fig. 4. A 2-interleaved suffix tree for input text ABRACADABRA. Notice two delimiter 
characters $ and ’/. respectively. 


to a node in layer fc/2. The query using j = 2^ < p processors for some integer x 
is done by navigating the layer and then mapping the obtained nodes from 
T-0) t-0/2) _j. _ _ _j. mapping merges partially navigated paths to 

obtain the resulting node. We will discuss the query procedure in detail in the 
next subsection. First, we formally describe how to map from one layer to the 
next one. 

Lemma 4. For each node w £ there exists a pair of nodes uji,uj 2 £ 

such that deinterleaving the longest substrings which oJi and 1 x 2 correspond to, 
results in one of the substrings which uj corresponds to. 

Proof. Without loss of generality we set fc = 2. Assume one of the substrings 
which u! corresponds to is a substring of the input text T[a,a + where |a;| — 
skipvalue(w) < ^ < |a;|. Then, there exist two 2-interleaved subsequences of T: 

Xi = T[a]T[a-h 2]... T[a-h 2 [//2J] 

^2 = T[a + l]T[a + 3] .. .T[a + 2[{l - 1)/2J -h 1]. 

Notice X 2 is empty, if skipvalue(a;) = 1. 

Xi and X 2 are prefixes of two suffixes contained in Take the leaves 
corresponding to these suffixes and find the first nodes uji and UJ 2 respectively 
on a path from a root to a leaf which cumulative sum |a;i | > \Xi \ and |a; 2 1 > | W 2 1. 
Follows wi and a ;2 exist in such that deinterleaving corresponding longest 
substrings is long |a;i| + \ijJ 2 \ > I characters and not shorter. This proves that w 
or one of its descendants will correspond to the deinterleaved string. 

w discriminates between at least two substrings in T whose characters are dif¬ 
ferent at position |w| +1. Consequently, uji or UI 2 must also discriminate between 
at least two subsequences whose characters are different at position [|w|/2j -|- 1. 
Following Definition!^ the length of the resulting string depends on the length 
of the shorter subsequence, formally 

|X| = min{\Xi\, \X 2 \) + min{\Xi\, \X 2 \ + 1) 

= min{\uji\, |a; 2 |) + min{\uji\, |w 2 | + 1) < |w|. 
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This proves that oj or one of its predecessors will correspond to the resulting 
string. Since we excluded the predecessors in the previous paragraph, this con¬ 
cludes that exactly uj will correspond to deinterleaving of uji and uj 2 ■ n 

Each layer k in layered interleaved suffix tree contains a dictionary which 
maps a pair of nodes u)i,uj 2 in to a node oj in according to Lemma 

for all nodes in Intuitively the mapping efficiently deinterleaves the 

longest subsequences corresponding to uji and 0 ^ 2 , to node w corresponding to 
the resulting string. In turn the mapping fixates the fc — 1 characters in the text 
between each character of the query string, which occurred, if we queried the 
fc-interleaved suffix tree as described in the previous subsection. 

To construct the dictionary at each layer, we need to consider which pair 
{uji,uj 2 ) —t w to store. Because of the path compression, node w corresponds to 
0(n) substrings. Storing each pair of nodes uii,ui 2 for which their corresponding 
longest substrings deinterleave to one of the substrings corresponding to w would 
require 0{n?) space per each layer. We use the same reasoning as in Proposi¬ 
tions and in the previous section. We find and store a pair of nodes which 
corresponding substrings deinterleave to the shortest substring ui corresponds 
to. 

Definition 4. Dictionary ofr^^^ is a dictionary, which for each node oj G 
maps a pair of nodes uji,uj 2 G to uj for the smallest |a;i| and \oj 2 \ such that 
deinterleaving the longest substrings which uji and UI 2 correspond to results in 
one of the substrings cu corresponds to. 

Storing such pairs, we assure nodes a;i,a ;2 will always be reachable by the 
shortest possible query corresponding to lo. This, however, imposes additional 
work during the query phase in order to correctly find mappings for the longer 
query strings which we describe next. 

5.4 Parallel Query in Layered Interleave Suffix Tree 

We assign p-interleaved subsequences of Q to processors 1... p respectively. 
Then, each processor, independently navigates according to the assigned 
subsequence. 

Let TTp^ denote the navigated path by processor i in Initially we navi¬ 
gated paths We need to interleave the longest substrings which the nodes 
on these paths correspond to and obtain the paths at layer p/2. At each layer 
j, the interleaving is done pairwise as follows. We interleave each path 7rp^ with 
for i = {1,3, ...j — 1} and obtain j 72 new paths at layer j/2. We con¬ 
tinue interleaving the paths recursively until we reach the first layer, that is the 
original suffix tree. 

To efficiently interleave all substrings which the nodes of two paths and 
correspond to, we probe the dictionary at layer j described in the previous 
subsection. The probed key consists of one node from nf and the second node 
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from If the probe is successful, we append the resulting node to the new 

path increasing cumulative skip value order. 

Next, we explore the required number of probes in the dictionary in order 
to construct a valid path at the next layer. The trivial upper bound probing all 
possible pairs of nodes in both paths is where m denotes the original 

query length. We reduce the number of probes to 0{m/j) in the following way. 

Lemma 5. Deinterleaving paths 7rp^ and requires |7rp^| + probes in 

the dictionary. 

Proof. We start by Wi ^ 7rp^[l] and UJ 2 ^ Then, we compare the 

cumulative skip values of children of wi and UJ 2 , and we follow an edge to a child 
with the smaller one due to Property If the values of both children are equal, 
we follow an edge to wi’s child. We repeat the procedure until we reached the 
end of both paths. Each time we follow an edge, we perform a probe wi, a ;2 to the 
dictionary. Since each time we follow one node, the procedure requires exactly 
+ Ki+il probes to the dictionary. □ 

Deinterleaving procedure introduced in the proof of Lemma is embarrass¬ 
ingly parallel. Each of j processors is assigned a consecutive m/j nodes of paths. 
Notice the paths need to be aligned according to the cumulative skip value of 
the nodes. This way, we optimally employ j processors and construct the path 
at the next layer. Figure illustrates the navigation in 2-interleaved suffix tree 
and the required probes in the dictionary. 


a Pi 5i El 

1:0 -- 00 ^( 40 ^ 0 ^ 0 -^. 

2: (0—02)-H00000. 

“ P2 I2 §2 £2 


Fig. 5. Dictionary probes during the parallel query in 2-interleaved suffix tree for p = 2. 
The two paths on the figure are visited nodes by processors 1 and 2. a is the root 
node of 2-interleaved suffix tree common to both processors, while the other nodes are 
separated. Numbers inside nodes denote the cumulative skip value of the node. Dotted 
lines illustrate the probes to the dictionary during the parallel query. 


Finally, when we constructed a path at the first layer , the deepest node on 
this path is the resulting node of the query. We check whether skipped characters 
during the query match the query string and report all the leaves of the subtree 
rooted at the resulting node. 
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Time and Work Complexity. The assignment of interleaved subqueries to p 
processors can be done implicitly. Each processor navigates the p-interleaved 
suffix tree independently which requires 0{m/p) time and 0{m) work. Finally, 
deinterleaving intermediate paths requires 0{—\gp) time on CREW PRAM and 
0 {mlgp) work. 

In detail, we deinterleave each pair of paths at the top layer in parallel as 
described before. Also, each pair of paths deinterleaves independently from the 
others. This way we employ all p processors and spend 0(m/p) time. Notice 
each probe to dictionary takes constant time by using the perfect hash table. At 
the next layer, we deinterleave p/2 paths of length < 2m/p. In general, at each 
layer we use p processors and spend 0{m/p) time. After Igp steps, we construct 
the final path in the original suffix tree. 

Space Complexity. Each fc-interleaved suffix tree requires 0{n) space. The lay¬ 
ered interleaved suffix tree consists of Igp layers of /c-interleaved suffix trees 
and dictionaries requiring O(nlgp) space overall. Notice, if we support parallel 
query for the fixed p and not smaller, we can only keep p-interleaved suffix tree 
at layer p, Igp — 1 dictionaries, and the original suffix tree. Asymptotically this 
still requires 0(nIgp) space. The intermediate paths during the query procedure 
require 0 {m + p) temporary space. 

Theorem 3. Parallel query in layered interleaved suffix tree requires 0(y Igp) 
time on CREW PRAM and O(nlgp) space. 

6 Conclusion 

We have explored a parallel query in suffix tree based data structures where the 
number of processors p n. We have presented two algorithms. The first one 
uses as a data structure a suffix trie. It splits the query string into p consecutive 
subqueries, navigates the underlying suffix trie, and merges the intermediate 
nodes into the final node. This requires 0{m + p) work, 0{m/p + lgp) time and 
0(ri?) space in the worst case. The second algorithm extends the same principle 
to suffix trees where we correctly solve issues concerning the path compression. 
However, the subquery overlapping required to solve the path compression issues 
introduced dependencies between the navigated paths. In the worst case, no 
parallelism can be employed and an inherently sequential execution is performed. 
Finally we have presented a layered interleaved suffix tree data structure. Instead 
of processing the query in a consecutive subquery manner, we use interleaving. 
This approach requires 0(mIgp) work, O(^lgp) time and O(nlgp) space in 
the worst case. To the best of our knowledge, the presented algorithms are the 
first parallel algorithms for pattern matching requiring the amount of work not 
related to n. The number of processors in previous solutions assumed p is of 
order of n. 

There exists an open question whether we reached the space-work lower 
bound for the pattern matching problem. We also haven’t provided any parallel 
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cache complexity analysis of our query algorithms, perhaps using the cache- 
oblivious string dictionary [ 3 ] or the string B-tree m as a starting point. From 
the applied point of view, an interesting research question would be whether the 
presented algorithms improve cache performance in comparison to traditional 
sequential query, since each processor accesses 0 {m/p + Igp) nodes instead of 
0 {m) during the suffix tree navigation and there is a better chance the accessed 
nodes might have already been in the cache due to temporal locality. 
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